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In this article we develop quantum algorithms for learning and testing juntas, i.e. Boolean func- 
tions which depend only on an unknown set of k out of n input variables. Our aim is to develop 
efflcient algorithms: 

• whose sample complexity has no dependence on n, the dimension of the domain the Boolean 
functions are defined over; 

• with no access to any classical or quantum membership ("black-box") queries. Instead, our 
algorithms use only classical examples generated uniformly at random and fixed quantum superpo- 
sitions of such classical examples; 

• which require only a few quantum examples but possibly many classical random examples 
(which are considered quite "cheap" relative to quantum examples). 

Our quantum algorithms are based on a subroutine FS which enables sampling according to the 
Fourier spectrum of /; the FS subroutine was used in earlier work of Bshouty and Jackson on 
quantum learning. Our results are as follows: 

• We give an algorithm for testing fc-juntas to accuracy e that uses 0{k/e) quantum examples. 
This improves on the number of examples used by the best known classical algorithm. 

• We establish the following lower bound: any FS-based fc-junta testing algorithm requires 

queries. 

• We give an algorithm for learning fc-juntas to accuracy e that uses 0(e~^fclogfc) quantum 
examples and 0(2*^ log(l/e)) random examples. We show that this learning algorithms is close to 
optimal by giving a related lower bound. 

Keywords: juntas, quantum query algorithms, quantum property testing, computational learning theory, 
quantum computation, lower bounds 



I. INTRODUCTION 
A. Motivation 

The field of computational learning theory deals with the abilities and limitations of algorithms that learn functions 
from data. Many models of how learning algorithms access data have been considered in the literature. Among 
these, two of the most prominent are via membership queries and via random examples. Membership queries are 
"black-box" queries; in a membership query, a learning algorithm submits an input x to an oracle and receives the 
value of f{x). In models of learning from random examples, each time the learning algorithm queries the oracle it 
receives a labeled example {x,f{x)) where x is independently drawn from some fixed probability distribution over 
the space of all possible examples. (We give precise definitions of these, and all the learning models we consider, in 
Section mi) 

In recent years a number of researchers have considered quantum variants of well-studied models in computational 
learning theory, see e.g. [l, 4, 8, 10, 15, 16, 28]. As we describe in Section[ni models of learning from quantum mem- 
bership queries and from fixed quantum superpositions of labeled examples (we refer to these as quantum examples) 
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have been considered; such oracles have been studied in the context of quantum property testing as well [1, [l^ [12] ■ 
One common theme in the existing literature on quantum computational learning and testing is that these works 
study algorithms whose only access to the function is via some form of quantum oracle such as the quantum mem- 
bership oracle or quantum example oracles mentioned above. For instance, f§\ modifies the classical Harmonic Sieve 
algorithm of [l7| so that it uses only uniform quantum examples to learn DNF formulas. 6] considers the problem 
of quantum property testing using quantum membership queries to give an exponential separation between classical 
and quantum testers for certain concept classes. 4] studies the information-theoretic requirements of exact learning 
using quantum membership queries and Probably Approximately Correct (PAC) learning using quantum examples. 
Many other articles such as [l|, could further extend this list. 

As the problem of building large scale quantum computers remains a major challenge, it is natural to question the 
technical feasibility of large scale implementation of the quantum oracles considered in the literature. It is desirable to 
minimize the number of quantum (as opposed to classical) oracle queries or examples required by quantum algorithms. 
Thus motivated, in this paper we are interested in designing testing and learning algorithms with access to both 
quantum and classical sources of information (with the goal of minimizing the quantum resources required). 



B. Our results 

All of our positive results are based on a quantum subroutine due to IS], which we will refer to as an FS (Fourier 
Sample) oracle call. As explained in Section |TT1 a call to the FS oracle yields a subset of {1, ... ,n} (this set should 
be viewed as a subset of the input variables xi, . . . , a;„ of /) drawn according to the Fourier spectrum of the Boolean 
function /. As demonstrated by such an oracle can be implemented using 0(1) uniform quantum examples from 
a uniform distribution quantum example oracle. In fact, all of our algorithms will be purely classical apart from their 
use of the FS oracle. Thus, all of our algorithms can be implemented within the (uniform distribution) quantum PAC 
model first proposed by This model is a natural quantum extension of the classical PAC model introduced by 
Valiant ^211, as described in Section |TT1 We emphasize that no membership queries, classical or quantum, are used in 
our algorithms, only uniform quantum superpositions of labeled examples, and we recall that such uniform quantum 
examples cannot efficiently simulate even classical membership queries in general (see 

Our approach of focusing only on the FS oracle allows us to abstract away from the intricacies of quantum com- 
putation, and renders our results useful in any setting in which an FS oracle can be provided to the user. In fact, 
learning and testing with FS oracle queries may be regarded as a new distinct model (which may possibly be weaker 
than the uniform distribution quantum example model). 

We are primarily interested in the information theoretic requirements (i.e. the number of oracle calls needed) of the 
learning and testing problems that we discuss. We give upper and lower bounds for a range of learning and testing 
problems related to k-juntas] these are Boolean functions / : {—1, 1}" — > {—1, 1} that depend only on (an unknown 
subset of) at most k of the n input variables xi, . . . , Xn- Juntas have been the subject of intensive research in learning 
theory and property testing in recent years, see e.g. [I, H, H, [111, [H, HH, HJ- 

Our first result, in Section Hill is a fc-junta testing algorithm which uses 0(fce~^) FS oracle calls. Our algorithm 
uses fewer queries than the best known classical junta testing algorithm due to Fischer et al. which uses 

0((fc log fc)^e~^) membership queries. However, since the best lower bound known for classical membership query 
based junta testing (due to Chockler and Gutfreund ill)]) is ^{k), our result does not rule out the possibility that 
there might exist a classical membership query algorithm with the same query complexity. 

To complement our FS based testing algorithm, we establish a new lower bound: Any fc-junta testing algorithm 
that uses only a FS oracle requires Vl{-\/k) calls to the FS oracle. This shows that our testing algorithm is not too far 
from optimal. 

Finally, we consider algorithms that can both make FS queries and also access classical random examples. In Sec- 
tion llVl we give an algorithm for learning fc-juntas over {—1, 1}" that uses 0(e~^fclogfc) FS queries and 0{2^ log(e~^)) 
random examples. Since any classical learning algorithm requires Vl{2^ -\- logn) examples (even if it is allowed to use 
membership queries), this result illustrates that it is possible to reduce the classical query complexity substantially 
(in particular, to eliminate the dependence on n) if the learning algorithm is also permitted to have some very limited 
quantum information. Moreover most of the consumption of our algorithm is from classical random examples which 
are considered quite "cheap" relative to quantum examples. From another perspective, our result shows that for 
learning fc-juntas, almost all the quantum examples used by the algorithm of Bshouty and Jackson 8] can in fact be 
converted into ordinary classical random examples. We show that our algorithm is close to best possible by giving a 
nearly matching lower bound. 
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C. Organization 

In Section |TT] we describe the models and problems we will consider and present some useful preliminaries from 
Fourier analysis and probability. Section IIIII gives our results on testing juntas and Section IIVI gives our results on 
learning juntas. 

II. PRELIMINARIES 
A. The problems and the models 

In keeping with standard terminology in learning theory, a concept f over {—1,1}" is a Boolean function / : 
{— Ijl}" — > {—1,1}, where —1 stands for True and 1 stands for False. A concept class £ = U„>iC„ is a set of 
concepts where C„ consists of those concepts in £ whose domain is {—1, 1}". For ease of notation throughout the 
paper we will omit the subscript in C„ and simply write C to denote a collection of concepts over { — 1, 1}". 

The concept class we will chiefly be interested in is the class of k-juntas. A Boolean function /: {—1, 1}" {^Ij 1} 
is a /c-junta if / depends only on k out of its n input variables. 

1. The problems 

We are interested in the following computational problems: 

PAC Learning under the uniform distribution: Given any target concept / e C, an e-learning algorithm for 
concept class C under the uniform distribution outputs a hypothesis function h : {—1,1}" — > {—1,1} which, 
with probability at least 2/3, agrees with c on at least a 1 — e fraction of the inputs in {— 1 , 1}". This is a widely 
studied framework in the learning theory literature both in classical (see for instance [l3j[2fl|) and in quantum 
(see 0) versions. 

Property testing: Let / be any Boolean function / : { — 1, 1}" {^^, !}■ A property testing algorithm for concept 
class C is an algorithm which, given access to /, behaves as follows: 

• If / e C then the algorithm outputs Accept with probability at least 2/3; 

• If / is e-far from any concept in C (i.e. for every concept g G C, f and g differ on at least an e fraction of 
all inputs), then the algorithm outputs Reject with probability at least 2/3. 

The notion of property testing was first developed by [l3 | and p7| . Quantum property testing was first studied 
by Buhrman et al. @ , who first gave an example of an exponential separation between the query complexity of 
classical and quantum testers for a particular concept class. 

Note that a learning or testing algorithm for C "knows" the class C but does not know the identity of the concept /. 
While our primary concern is the number of oracle calls that our algorithms use, we are also interested in time efficient 
algorithms for testing and learning; for the concept class of fc-juntas, these are algorithms running in poly(n, 2*^, e~^) 
time steps. 

2. Classical oracles 

In order for learning and testing algorithms to gather information about the unknown concept /, they need an 
information source called an oracle. The number of times an oracle is queried by an algorithm is referred to as the 
query complexity. Sometimes our algorithms will be allowed access to more than one type of oracle in our discussion. 

In this paper we will consider the following types of oracles that provide classical information: 

Membership oracle MQ: For / a Boolean function, a membership oracle MQ(/) is an oracle which, when queried 
with input x, outputs the label f{x) assigned by / to x. 



Uniform random example oracle EX: A query EX(/) of the random example oracle returns an ordered pair 
(x, f{x)) where x is drawn uniformly random from the set {—1, 1}" of all possible inputs. 
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Clearly a single call to an MQ oracle can simulate the random example oracle EX. Indeed EX oracle queries are 
considered "cheap" compared to membership queries. For example, in many settings it is possible to obtain random 
labeled examples but impossible to obtained the label of a particular desired example (consider prediction problems 
dealing with phenomena such as weather or financial markets). We note that the set of concept classes that are known 
to be efficiently PAC learnable from uniform random examples only is rather limited, see e.g. p^. [23|. In contrast, 
there are known efficient algorithms that use membership queries to learning important function classes such as DNF 
(Disjunctive Normal Form) formulas pTzj . 

3. Quantum oracles: 

We will consider the following quantum oracles, which are the natural quantum generalizations of membership 
queries and uniform random examples respectively. 

Quantum membership oracle QMQ: The quantum membership oracle QMQ(/) is the quantum oracle whose 
query acts on the computational basis states as follows: 

QMQ(/) : \x, b) ^\x,h- f{x)), where x e {-1, 1}" and b e {-1, 1}. 

Uniform quantum examples QEX: The uniform quantum example oracle QEX(/) is the quantum oracle whose 
query acts on the computational basis state 1) as follows: 

QEX(/): |1M)k. _l_|a;,/(x)). 

a:e{-l,l}" 

The action of a QEX(/) query is undefined on other basis states, and an algorithm may only invoke the QEX(/) 
query on the basis state |1", 1). 

It is clear that a QMQ oracle can simulate a QEX oracle or an MQ oracle, and a QEX oracle can simulate an EX 
oracle. 

The model of PAC learning with a uniform quantum example oracle was introduced by Bshouty and Jackson in . 
Several researchers have also studied learning from a more powerful QMQ(/) oracle, see e.g. [H, 0, [H, Ull ■ Turning to 
property testing, we are not aware of prior work on quantum testing using only the QEX(/) oracle; instead researchers 
have considered quantum testing algorithms that use the more powerful QMQ(/) oracle, see e.g. [gI [T3l[2^. 

B. Harmonic analysis of functions over { — 1, 1}" 

We will make use of the Fourier expansion of real valued functions over {—1,1}". We write [n] to denote the set of 
variables {xi,X2, ■ ■ ■ , Xn}- 

Consider the set of real valued functions over {—1, 1}" endowed with the inner product 

(/,ff)=E[/g] = i^^/(x)g(x) 



and induced norm ||/|| = ^ (/, /). For each S' C [n], let xs be the parity fimction xs{x) ~ Tlx es ^i- ^ weW known 
fact that the 2" functions {xs{x), S C [n]} form an orthonormal basis for the vector space of real valued functions 
over {—1, 1}" with the above inner product. Consequently, every /: {—1, 1}" M can be expressed uniquely as: 

fix) = fiS)xsix) 

SQ[n] 

which we refer to as the Fourier expansion or Fourier transform of /. Alternatively, the values {f{S): S C [n]} are 
called the Fourier coefficients or the Fourier spectrum of /. 

Parseval's Identity, which is an easy consequence of orthonormality of the basis functions, relates the values of the 
coefficients to the values of the function: 

Lemma II. 1 (Parseval's Identity) For any f : {—1, 1}" M, we have X^scfn] l/('5')P = E[/^]. Thus for a Boolean 
valued function J2sc[n] = 1- 
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We will use the following simple and well-known fact: 

Fact II.2 (See [io"]) For any f : {-1, 1}" {-1, 1} and any g : {-1,1}" R, we have 

Pr.[/(x)^sgn(5(x))]<E,[(/(a;)-.9(a;)f]= ^ 

SC[„] 

Recall that the influence of a variable on a Boolean function / is the probability (taken over a uniform random 
input X for /) that / changes its value when the i-th bit of x is flipped, i.e. 

Inf,(/) = Prefix, ^-l)^ /(x. ^ 1)]. 

It is well known (see e.g. [H) that Infj(/) = Esbx, \.fiS)\^- 

C. Additional tools 

Fact II. 3 (Data Processing Inequality) Let Xi,X2 be two random variables over the same domain. For any 
(possibly randomized) algorithm A, one has that 

\\A{Xi) - A{X2)\\i < \\Xi-X2\\i. 

Let Si , 5*2 be random variables corresponding to sequences of draws taken from two different distributions over the 
same domain. By the above inequality, if H^i — 5*2111 is known to be small, then the probability of success must be 
small for any algorithm designed to distinguish if the draws are made according to 5*1 or 5*2. 
We will also use standard Chernoff bounds on tails of sums of independent random variables: 

Fact II. 4 (Additive Bound) Let Xi, . . . , Xm be i.i.d. random variables with mean fj, taking values in the range 
[a,b]. Then for all \ > we have Pr[\:^J2T=i - > M < 2exp(^^^^). 

D. The Fourier sampling oracle: FS 

Definition II. 5 Let f : { — 1, 1}" ^ {^1, 1} be a Boolean function. The Fourier sampling oracle FS{f) is the classical 
oracle which, at each invocation, returns each subset of variables S C {1, . . . , n} with probability \ f(S)\'^, where f{S) 
denotes the Fourier coefficient corresponding to Xsi^) ^.^ defined in Section \lI B[ 

This oracle will play an important role in our algorithms. Note that by Parseval's Identity we have X]sc[ri] l/l*^)!^ = 
1 so the probability distribution over sets S indeed has total weight 1. 

In N] Bshouty and Jackson describe a simple constant-size quantum network QSAMP, which has its roots in an idea 
from 9]. QSAMP allows sampling from the Fourier spectrum of a Boolean fmiction using 0(1) QEX oracle queries: 

Fact II. 6 (See [8]) For any Boolean function f, it is possible to simulate a draw from the FS{f) oracle with proba- 
bility 1 — (5 using 0{logS^^) queries to QEX(/). 

All the algorithms we describe are actually classical algorithms that make FS queries. 

III. TESTING JUNTAS 

Fischer et al. studied the problem of testing juntas given black-box access (i.e., classical membership query 
access) to the unknown function / using harmonic analysis and probabilistic methods. They gave several different 
algorithms with query complexity independent of n, the most efficient of which yields the following: 

Theorem III.l (See fl^, Theorem 6]) There is an algorithm that tests whether an unknown f : {—1,1}" 
{ — 1, 1} is a k-junta using 0{{klogk)^fr^) membership queries. 

Fischer et al. also gave a lower bound on the number of queries required for testing juntas, which was subsequently 
improved by Chockler et al. to the following: 

Theorem III. 2 (See [11]) Any algorithm that tests whether f is a k-junta or is 1/3- far from every k-junta must 
use fl{k) membership queries. 

We emphasize that that both of these results concern algorithms with classical membership query access. 
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A. A testing algorithm using 0{k/e) FS oracle calls 

In this section we describe a new testing algorithm that uses the FS oracle and prove the following theorem about 
its performance: 

Theorem III. 3 There is an algorithm that tests the property of being a k-junta using 0{k/e) calls to the FS oracle. 

As described in Section |TT1 the algorithm can thus be implemented using 0{k/e) uniform quantum examples from 
QEX(/). 

Proof: Consider the following algorithm A which has FS oracle access to an unknown function / : { — 1, 1}" — > { — 1, !}• 
Algorithm A first makes 10(fc + l)/e calls to the FS oracle; let S denote the union of all the sets of variables received 
as responses to these oracle calls. Algorithm A then outputs "Accept" if \S\ < k and outputs "Reject" if \S\ > k. 

It is clear that if / is a fc-junta then A outputs "Accept" with probability 1. To prove correctness of the test it 
suffices to show that if / is e-far from any A:-junta then Pr[yl outputs "Reject"] > |. 

The argument is similar to the standard analysis of the coupon collector's problem. Let us view the set S as growing 
incrementally step by step as successive calls to the FS oracle are performed. 

Let Xi be a random variable which denotes the number of FS queries that take place starting immediately after the 
(i — l)-st new variable is added to S, up through the draw when the i-th new variable is added to S. If the (i — l)-st 
and i-th new variables are obtained in the same draw then Xi — 0. (For example, if the first three queries to the FS 
oracle are {1, 2, 4}, {2, 4}, {1, 4, 5, 6}, then we would have Xi = 1, = 0, X3, = 0, X4 = 2, X5 = 0.) 

Since / is e-far from any fc-junta, we know that for any set T oi k' <k variables, it must be the case that 

SCT 

(since otherwise if we set g — J^sct f{'^)xsi h — sgn((7) and use Fact III.21 we would have 

SIT 

which contradicts the fact that / is e-far from any fc-junta). It follows that for each 1 < i < fc, if at the current stage 
of the construction of S we have |iS| = i, then the probability that the next FS query yields a new variable outside of 
S is at least e. Consequently we have E[A"i] < i for each 1 < i < fc -I- 1, and hence 

. (fc + 1) 

By Markov's inequality, the probability that Xi + ■ • • + X^+i < 10(fc + l)/e is at least 9/10, and therefore with 
probability at least 9/10 it will be the case after 10(fc + l)/e draws that \S\ > k and the algorithm will consequently 
output "Reject." ■ 

Note that the 0{k/e) uniform quantum examples required for Algorithm A improves on the 0((fclogfc)^/e) query 
complexity of the best known classical algorithm. However our result does not conclusively show that QEX queries 
are more powerful than classical membership queries for this problem since it is conceivable that there could exist an 
as yet undiscovered 0(fc/e) classical membership query algorithm. 



B. Lower bounds for testing with a FS oracle 



1. A first approach 

As a first attempt to obtain a lower bound on the number of FS oracle calls required to test fc-juntas, it is natural 



to consider the approach of Chockler et al. from [ll| . To prove Theorem IIII.21 Chockler et al. show that any 
classical algorithm which can successfully distinguish between the following two probability distributions over black- 
box functions must use ri(fc) queries: 

• Scenario I: The distribution ©[."^ is uniform over the set of all Boolean functions over n variables which do 
not depend on variables k -\- 2, . . . ,n. 
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FIG. 1: A decision tree computing the addressing function in the case r = 3. The left edge out of each node corresponds to 
the variable at the node taking value —1 and the right edge to the variable taking value 1. 



(1) 



Scenario II: The distribution T> ^ 
index i is chosen uniformly from 1, 
do not depend on variables A; + 2, . 



is defined as follows: to draw a function / from this distribution, first an 
. , fc + 1, and then / is chosen uniformly from among those functions that 
, n or on variable i. 



The following observation shows that this approach will not yield a strong lower bound for algorithms that have 
access to a FS oracle: 



Observation III. 4 With O(logfc) queries to a FS oracle, 
drawn from Scenario I or Scenario II. 



it is possible to determine w.h.p. whether a function f is 



Proof: It is easy to see that a function drawn from Scenario I is sim ply a random function on the first fc + 1 variables. 
The Fourier spectrum of random Boolean functions is studied in [26|, where it is shown that sums of squares of 
Fourier coefficients of random Boolean functions are tightly concentrated around their expected value. In particular, 
Proposition 6 of (26j directly implies that for any fixed variable Xi,i & 1, . . . , k + 1, we have: 



Pr 



.SBXi 



1 

>3 



< exp(-2'=+V2592). 



Thus with overwhelmingly high probability, if / is drawn from Scenario I then each FS query will "expose" variable 
i with probability at least 1/3. It follows that after O(logfc) queries all fc + 1 variables will have been exposed; so by 
making O(logfc) FS queries and simply checking whether or not fc + 1 variables have been exposed, one can determine 
w.h.p. whether / is drawn from Scenario I or Scenario II. H 
Thus we must adopt a more sophisticated approach to prove a strong lower bound on FS oracle algorithms. 



2. An Sl(\/fc) lower bound for FS oracle algorithms 

Our main result in this section is the following theorem: 

Theorem III. 5 Any algorithm that has FS oracle access to an unknown f must use Q{Vk) oracle calls to test whether 
f is a k-junta. 

Proof: Let k be such that k — r + 2^~^ for some positive integer r. We let R denote 2''. The addressing function on 
r + R variables has r "addressing variables," which we shall denote Xi, . . . ,Xr, and R = 2^ "addressee variables" which 
we denote zq, . . . , zr-i. The output of the function is the value of variable Zx where the "address" x is the element 
of {0, . . . , R — 1} whose binary representation is given by xi . . . Xr. Figure 1 depicts a decision tree that computes 
the addressing function in the case r — 3. Formally, the Addressing function Addressing : {—1, lY'^^ {—^, 1} is 
defined as follows: 

AdDRESSING(x1, 2:2, . . . , Xr, Zq, Zi, . . . , Z/f-i) = Zx, 

where x — ( — - — -) o ( — - — -) o . . . o ( — - — -) in binary form and o is binary concatenation. 
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Intuitively, the Addressing function will be useful for us because as we will see the Fourier spectrum is "spread 
out" over the R addressee variables; this will make it difficult to distinguish the Addressing function (which is not a 
fc-junta since k = r + R/2 and as we shall see is in fact far from every fc-junta) from a variant which is a fc-junta. 

Let xi, . . . , Xr, Uq, ■ ■ ■ , Vn-r—i ths 71 variables that our Boolean functions are defined over. We now define two 
distributions ©reject, 25 accept over functions on these variables. 

The distribution Dreject is defined as follows: to make a draw from ©reject, 

1. First uniformly choose a subset T oi R variables from {yo, . . . , yn-r-i}', 

2. Next, replace the variables zq^ . . . , zr-i in the function 

ADDRESSING(a;i, . . . ,Xr, Zq, . . . , Z^R-i) 

with the variables in T (choosing the variables from T in a uniformly random order). Return the resulting 
function. 

Note that step (2) in the description of making a draw from ©reject above corresponds to placing the variables in T 
uniformly at the leaves of the decision tree for Addressing (see Figure 1). 
Equivalently, if we write /r to denote the following function over n variables 

fr{xi, ...,Xr,yQ,-- . , y„_r-i) = Addressing(xi,X2, . . ■ ,Xr,yr{o),yT{i), ■ ■ • , 2/T(i?-i) ) ; (ULl) 

a draw from ©reject is a function chosen uniformly at random from the set Creject = {fr} where r ranges over all 
permutations of {0, . . . , n — r — 1}. 

It is clear that every function in Creject (the support of ©reject) depends on r + i? variables and thus is not a 
fc-junta. In fact, every function in Creject is far from being a fc-junta: 

Lemma III. 6 Every f that has nonzero probability under ©reject is 1/6-far from any k-junta. 

Proof: Fix any such / and let g be any fc-junta. It is clear that at least R/2 ~ r of the "addressee" variables of / 
are not relevant variables for g. For a > 1/3 fraction of all inputs to /, the value of / is determined by one of 

these addressee variables; on such inputs the error rate of g relative to / will be precisely 1/2. ■ 



Fix any function fr in Creject- We now give an expression for the Fourier representation of fr- The expression is 
obtained by viewing fr as a sum of R subfunctions, one for each leaf of the decision tree, where each subfunction takes 
the appropriate nonzero value on inputs which reach the corresponding leaf and takes value on all other inputs: 



fr ^ , l + (-l)'^xi ,, l + (-l)'^a^2 , l + (-l)"-a:. , 

fr{xi,...,Xr,yo,...,yn-r-l) ^ 2^ yr(i){ )( ^^"^ )■••( ) (111.2) 

=0 



-^E E (-l)*^^^"""^'y.(i)Xx. (III.3) 

i=0 XC{xi,...,Xr.} 

Note that whenever -^-j^ — ii, ^-^^ = ^2, ■ • • , ^ 2'^ ~ ^^"^ RHS of Equation pil.2p has precisely one 

non-zero term which is j/r(i)- This is because the rest of the terms are annihilated since in each of these terms there 

is some index j such that = 1 — ij which makes C~^^~l^ ) = 0. Consequently this sum gives rise to exactly 

the Addressing function in Equation pil.l[) which is defined as fr and consequently the equality in Equation pil.2[) 
follows. Equation pil.3p follows easily from rearranging pil.2p . 
Now we turn to ©accept- 

The distribution ©accept is defined as follows: to make a draw from ©accept, 

1. First uniformly choose a subset T of R/2 variables from {j/q, ■ • ■ , J/n-r-i}; 

2. Next, replace the variables zq, . . . , 2^^/2-1 in the function 

ADDRESSING(a;i, . . . ,Xr, Zq, ■ ■ ■ , Zji^i) 

with the variables in T (choosing the variables from T in a uniformly random order). 
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3. Finally, for each i = 0, . . . , R/2 — 1 do the following: if variable j/j was used to replace variable Zj in the previous 
step, let Si be a fresh uniform random ±1 value and replace variable z/^-i-i with s^yj. Return the resulting 
function. 

Observe that for any integer < i < i?/2 with binary expansion i ~ iioi20 ■ ■ ■ oi^^ we have that the binary expansion 
of i? — 1 — i is ii o ^2 o • • • o V- Thus steps (2) and (3) in the description of making a draw from Daccept may be 
restated as follows in terms of the decision tree representation for Addressing: 

2'. Place the variables yj £ T randomly among the leaves of the decision tree with index less than R/2. 

3'. For each variable yj € T placed at the leaf with index i = ii o i2 o ■ ■ ■ o < R/2 above, throw a ±1 valued coin 
Si and place siyj at the antipodal leaf location with index: i — ii o 12 o ■ ■ ■ o i,. = R — 1 — i. 

Equivalently, if we write gr.s to denote the following function over n variables 

gr,sixi, . . . ,Xr,yo, . ■ ■ ,yn-r-l) = 

ADDRESSING(a:i, ...,Xr, yr(0), yr(i?/2-l) , S(H/2-l)2/r(i?/2-l) , • ■ ■ , SoyT(O)); (III.4) 

a draw from ©accept is a function chosen uniformly at random from the set Caccept = {ffr.s} where t ranges over 
all permutations of {0, . . . ,n — r — 1} and s ranges over all of {—1, 1}^^^. It is clear that every function in Caccept 
depends on at most r + R/2 = k variables, and thus is indeed a A:-junta. 

By considering the contribution to the Fourier spectrum from each pair of leaves i, i of the decision tree, we obtain 
the following expression for the Fourier expansion of each function in the support of D accept : 



gr,s{Xl,. . . ,Xr,yo,. ■ ■ ,y„-r-l) = 2^ 2/r(i)( ^ )( ^ )••■( ^ ) 

i— Zl Z2 . ..ir—0 

+ 2^ )( )...( ) (III. 5) 



J2 (-l)'^^^-^"'^'yr(i)Xx ifsi = l; 



R/2-1 



[Since (-IP = -(-in =^ E r''"'"v''"''^'"r n's^.-'^' -f 1 

^ i=o 2^ (-1) ' yT(i)xx ifsi = -i. 

Kx(Z{xi,....Xr},\X\ odd 

Just as in the Equation pil.2p . whenever = ii, ^Z^'^ = i2, ■ ■ ■ , ^-^^ = ir, the sum on the RHS of Equation pil.5[) 
has precisely one non-zero term which is yT{i) if i < R/2 and S7?_i_iy^(/j_]^_i') if i > R/2. Therefore this sum gives 
rise to exactly the Addressing function in Equation pil.4p which is defined as gr,s and consequently the equality in 
Equation (IIII.SP follows. 

It follows that for each gr.s in the support of 'Daccept and for any fixed yj, all elements of the set {S: yj € 
S and g^{S) ^ 0} will have the same parity. Moreover, when draws from Daccept are considered, for every distinct 
yj this odd/even parity is independent and uniformly random. 

Now we are ready to prove Theorem IIII.5I Recall that a FS oracle query returns S with probability |/(>5')|^ for 
every subset S of input variables to the function. Considering the equations pil.3|) and pil.6p . for any / in Caccept 
or Creject its FS oracle will return a pair of the form {yj=r(i)i ^ ^ {^^ii ■ • • i ^r}- 

Let us define a set T of "typical" outcomes from FS oracle queries. Fix any N = o{^/k), and let T denote the 
set of all sequences {{yj^ , -'^"i), . • . , (2/jjv i -^n)} of length N which have the property that no yi occurs more than once 
among yj,,..., yj^ . 

Note that for any fixed /r ^ Dreject, every non-zero Fourier coefficient friS) satisfies |/r(5)|^ — = due 
to Equation piI.Sp . Therefore after /t- is drawn, for any fixed yj the probability of receiving a response of the form 
{yj, X) as the outcome of a FS query is either 

= 0, if /r is not a function of yj, i.e. j ^ {'''(O), . . . , r(i? — 1)}; or 

= if i £ {t(0), . . . , t{R — 1)}. This is because each of the 2^ = R responses {yj,X) occurs with probability 

Similarly, for any fixed gr.s ^ 23 accept, every non-zero Fourier coefficient 7f/~s{S) satisfies |g7^(S')|'^ — 2^h^ — 
due to Equation pil.6p . Therefore after gr^s is drawn, for any fixed j/j the probability of receiving a response of the 
form {yj, X) as the outcome of a FS query is either 

= 0, if gr.s is not a function of yj, i.e. j ^ {t(0), . . . , t{R/2 — 1)}; or 
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= if j G {'''(0), . . . , t{R/2 — 1)}. This is because each of the 2^ ^ = R/2 responses {yj, X) occurs with probabihty 

4 

Now let us consider the probabihty of obtaining a sequence from T under each scenario. 

• If the function is drawn from 'Dreject: the probabihty is at least 

1(1 - l/i?)(l - 2/R) ... (1 - N/R) > 1 - o(l) [by the Birthday Paradox]. 

• If the function is from 2) accept : the probability is at least 

1(1 - 2/R){l - 4/i?) ... (1 - 2N/R) > 1 - o(l) [by the Birthday Paradox] 

Now the crucial observation is that whether the function is drawn from ©reject or from Daccept , each sequence in 
T is equiprobable by symmetry in the construction. To see this, simply consider the probability of receiving a fixed 
(Uj, X) for some new yj in the next FS query of an unknown function drawn from either one of these distributions. 
Using the above calculations for \f{yj,X)\'^, one can directly calculate that these probabilities are equal in either 
scenario. Alternatively, for a function drawn from Daccept one can observe that since each successive yj is "new", 
a fresh random bit determines whether the support is an {yj,X) with |X| odd or even; once this is determined, the 
choice of X is uniform from all subsets with the correct parity. Thus the overall draw of (?/j, X) is uniform over all 
X's. Considering that the subset of relevant variables T, |r| = R/2 is uniformly chosen from {yo, ■ ■ ■ ,yn-r-i}, this 
gives the equality of the probabilities for each {yj,X) with a new yj when the function is drawn from Daccept- The 
argument for the case of Dreject is clear. 

Consequently the statistical difference between the distributions corresponding to the sequence of outcomes of the 
N FS oracle calls under the two distributions is at most o(l). Now Fact III. 3] implies that no algorithm making only N 
oracle calls can distinguish between these two scenarios with high probability. This gives us the result, and concludes 
the proof of Theorem IIII.5I ■ 

Intuitively, under either distribution on functions, each element of a sequence of N FS oracle calls will "look like" a 
uniform random draw X from subsets of {xi, . . . , Xr} and j from {0, . . . ,n — r — 1} where j and X are independent. 
Note that this argument breaks down at = Q{^/R). This is because if the algorithm queried the FS oracle 6(v^) 
times it will start to see some yis more than once with constant probability (again by the birthday paradox). But 
when the functions are drawn from Daccept the corresponding X^'s will always have a fixed parity for a given yi 
whereas for functions drawn from Dreject the parity will be random each time. This will provide the algorithm with 
sufficient evidence to distinguish with constant probability between these two scenarios. 

IV. LEARNING JUNTAS 
A. Known results 

The problem of learning an unknown fc-junta has been well studied in the computational learning theory literature, 
see e.g. 0, [E, HSl- The following classical lower bound will be a yardstick against which wc will measure our results. 

Lemma IV. 1 Any classical membership query algorithm for learning k-juntas to accuracy 1/5 must use r2(2'^ + logn) 
membership queries. 

Proof: Consider the restricted problem of learning an unknown function f{x) which is simply a single Boolean 
variable from {xi, . . . , Xn}- Since any two variables disagree on half of all inputs, any 1/5-learning algorithm can be 
easily modified into an algorithm that exactly learns an unknown variable with no more queries. It is well known that 
any set of n concepts requires r2(logn) queries for any exact learning algorithm that uses membership queries only, 
see e.g. Q. This gives the f2(logn) lower bound. 

For the il{2'') lower bound, we may suppose that the algorithm "knows" that the junta has relevant variables 
Xi, . . . ,Xk- Even in this case, if fewer than 12*^ membership queries arc made the learner will have no information 
about at least 1/2 of the function's output values. A straightforward application of the Chernoff bound shows that 
it is very unlikely for such a learner's hypothesis to be 1/5-accurate, if the target junta is a uniform random function 
over the relevant variables. This establishes the result. ■ 
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Learning juntas from uniform random examples EX(/) is a seemingly difficult computational problem. Simple 
algorithms based on exhaustive search can learn from 0(2*^ log n) examples but require fl{n'^) runtime. The fastest 
known algorithm in this setting, due to Mossel et ai, uses (n'^)'^ examples and runs in (n'')"TT examples time, 
where uj < 2.376 is the matrix multiplication exponent [24]. 

Bshouty and Jackson [8| gave an algorithm using uniform quantum examples from the QEX oracle to learn general 
DNF formulas. Their algorithm uses 0{ns^e~^) calls to QEX to learn an s-term DNF over n variables to accuracy 
e. Since any /c-junta is expressible as a DNF with at most 2^^^^ terms, their result immediately yields the following 
statement. 

Theorem IV. 2 (See [8]) There exists an e-learning quantum algorithm for k-juntas using 0(n2^'^'e^®) quantum 
examples under the uniform distribution quantum PAC model. 

Note that Q did not try to optimize the quantum query complexity of their algorithms in the special case of learning 
juntas. In contrast, our goal is to obtain a more efficient algorithm for juntas. 

The lower bound of [4i, Observation 6.3] for learning with quantum membership queries for an arbitrary concept 
class can be rephrased for the purpose of learning fc-juntas as follows. 

Fact IV. 3 (See 0|) Any algorithm for learning k-juntas to accuracy e = 1/10 with quantum membership queries 
must use 57(2'^) queries. 

Proof: Since we are proving a lower bound we may assume that the algorithm is told in advance that the junta 
depends on variables xi, . . . , Xk- Consequently we may assume that the algorithm makes all its queries with nonzero 
amplitude only on inputs of the form \x, 0, Observation 6.3] states that any quantum algorithm which 

makes queries only over a shattered set (as is the set of inputs {\x,l"~'')}^^^_i ijk for the class of /c-juntas) must 
make at least VC-DIM(C)/100 QMQ queries to learn with error rate at most e = 1/10; here VC-DIM(C) is the 
Vapnik-Chervonenkis dimension of concept class C. Since the VC dimension of the class of all Boolean functions over 
variables Xi, . . . ,Xk is 2*^, the result follows. ■ 
This shows that a QMQ oracle cannot provide sufficient information to learn a fc-junta using o{2'^) queries to high 

accuracy. It is worth noting that there are other similar learning problems known where an A^-query QMQ algorithm 
can exactly identify a target concept whose description length is uj{N) bits. For instance, a single FS oracle call 
(which can be implemented by a single QMQ query) can potentially give up to k bits of information; if the concept 
class C is the class of all 2'^ parity functions over the first k variables, then any concept in the class can be exactly 
learned by a single FS oracle call. 

Note that all the results we have discussed in this subsection concern algorithms with access to only one type of 
oracle; this is in contrast with the algorithm we present in the next section. 



B. A new learning algorithm 

The motivating question for this section is: "Is it possible to reduce the classical query/sample complexity drastically 
for the problem of junta learning if the learning algorithm is also permitted to have very limited quantum information?" 
We will give an affirmative answer to this question by describing a new algorithm that uses both FS queries (i.e. 
quantum examples) and classical uniform random examples. 

Lemma IV. 4 Let f : { — 1, 1}" — » {—1, 1} be a function whose value depends on the set of variables 2. Then there is 
an algorithm querying the FS oracle 0(e^^log]X]) times which w.h.p. outputs a list of variables such that 

• the list contains all the variables Xi for which Infi(/) > e; and 

• all the variables Xj in the list have non-zero influence: Infj(/) > 0. 

Proof: The algorithm simply queries the FS oracle N = 0{e^^ log \T\) many times and outputs the union of all the 
sets of variables received as responses to these queries. 

If Infi(/) > e then the probability that Xi never occurs in any response obtained from the N FS oracle calls is at 
most (1 - e)^ < jgi^. The union bound now yields that with probability at least 9/10, every Xi with Infi(/) > e is 
output by the algorithm. ■ 



Theorem IV. 5 There is an efficient algorithm e-learning k-juntas with 0{e ^fclogfc) queries of the FS oracle and 
0(2'^' log(e^^)) random examples. 
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Algorithm 1 The junta learning algorithm. 



1: Input: e>0,FS(/),EX(/). 

2: Stage 1: 

3: Construct a set containing all variables of / with an influence at least (e/lOfc) using the algorithm in Lemma llV. 41 Let A 

be the final result. 

4: Va e { — 1, 1}''^' , encountered(a) ^ False. 

5: Stage 2: 

6: repeat 

7: {x, fix)) <— Draw from EX(/). Let x\a denote the projection of x onto the variables in A. 
8: if encounter ed{x\ a) = False then 
9: value(x\A) ^ f{x), encounter ed[x\X) ^ True. 
10: end if 

11: until For at least (1 — e/3) fraction of all a G { — 1, 1}'"^', encounter edia) = True. 

12: Output the hypothesis: 



Proof: We claim Algorithm 1 satisfies these requirements. 

Assume we are given a Boolean function / whose value depends on the set of variables I with \I\ < k. By 
Lemma FlV. 41 0(e~^fclogfc) queries of the FS oracle will reveal all variables with influence at least (e/lOfc) with high 
probability during Stage 1. 

Assuming the algorithm of Lemma IIV.41 was successful, we group the variables as follows: 

Group Description 

A The set of variables encountered in Stage 1. 

B The set of relevant variables T\A. 

C The remaining n — \I\ variables the function does not depend on. 

Note that |^| + |S| < fc by Lemma riV.4l and by the assumption that / is a /c-junta. 

We reorder the variables of / so that the new order is A,B,C for notational simplicity, i.e. / is now considered to 
be over (ai, . . . ,a|_4|,6i, . . . ,6|b|,Ci, . . . ,C|c|)- We will denote an assignment to these variables by (a,b,c). 

In Stage 2 the algorithm draws random examples until at least (1 — e/3) fraction of all assignments to the variables 
in A are observed. Let us call this set of assignments by S, and for every a e 5, let us denote the first example 
(x, f{x)) drawn in Stage 2 for which x\j\ = a by x (a, b*^, c**). At the end of the algorithm, the following hypothesis 
is produced as the output: 



In other words, the value of the hypothesis only depends on the setting of the variables in A. Observe the probability 
that any given setting of a fixed set of variables in A has not been seen can be made less than e/50 using 0(log(e~^)2*'') 
uniform random examples. Therefore the linearity of expectation implies that after 0(log(e^^)2'^) random examples, 
the expected fraction of unseen assignments is < e/50. Thus by Markov's Inequality the fraction of unseen assignments 
will be < e/3 w.h.p. Hence Stage 2 will terminate w.h.p. after 0(log(e^^)2'^) random examples. Consequently, the 
whole algorithm terminates with high probability with the desired query consumption. All we need to verify is that 
the hypothesis constructed is e-accurate. 

The hypothesis H is e-accurate vifith high probability: 

We introduce some notation: Let B = {—1,1}; and given two strings u,v G B^, let uQv denote the bitwise 
multiplication between u,v; and let |u| denote the total number of — I's in u. Also let Iw denote the indicator 
function that takes value 1 if holds and value if is false. 

We start with the following fact: 

Fact IV.6 For any s e BI^I, we have ^ X! X! X! l[/(a.b0s,c)#/(a,b,c)] < e/10. 




value{x\A) if encounter ed{x\ a) ~ True 
True otherwise. 





aSBl^l bGBlSI cSBlC 



Proof: Given any string s e B''^', clearly there exists a sequence of \s\ + 1 strings: 

ll^l = w^-u^,. .. = s, where u' G fil^l, and for i = 1, . . . , s, \u'Qu'+^\ = 1. 
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Therefore, 

For any s e bI^I, ^ ^ Mf('^-bQs,c)^fie.,h,c)] 



aSBl-^l bGBlSI cSBl'^l 



< 



^ Y Y X! l[/(a,b0«' + \ 



c)//(a,b0uSc)] 



2 

aeBl-^l bGBiei cSBlci i=l 

Y Y Y Y l[/(a,b0t.'0«'+i,c)#/(a,b,c)] 

i=l \ aeBl-^l bGBiei cSBlCI 



The influence of the unique variable that takes value —1 in 

e 
lOk 



< e/10. [Since every bj G B has influence < and |S| < fc] 



For each a G B'-^I, consider a fixed setting of strings b'' G fil'^l, c'' G BI'''. Let us call the list of all these 
assignments F, i.e. F = {Va G B'-^' , (a, b", c**)}. For any such "list of assignments" F, we define the function 
Fp : {—1, 1}" 1} a-s follows: Fr{a, *, *) = /(a, b**, c**). The error incurred by approximating / by Fr is: 

Pr(a,b,c) [^r(a, b, c) ^ /(a, b, c)] = Prf.^b^c) [/(a, b^, c^) ^ /(a, b, c)] 
= Pr(a,b,c) [/(a, b"*, c) 7^ /(a, b, c)] [Since / does not depend on the variables in C] 

^ 2^ Y Y Y l[/(a,b»,c)#/(a,b,c)] ^ 2^ Y Y Y l[/(a,b-.c)^/(a,b-0s,c)] (IV.l) 

aGBl-^l bGBIBI cGBIC| aGBl-^l sGBlBI cGBIC 

Therefore if we consider the expected value of the incurred error Pr[_Fr 7^ /] over all "lists of assignments" F, 
equation (jIV.ip implies that: 

Er[Pr(a,b,c)[-Fr 7^ /]] = ^ Y Y Y Y l[/(a,b-0s,c)5:^/(a,bSc)] 

sGBlBI \ aGBl-^l b-GBlBI CGBICI 



<£/10, due to Fact lIV.61 

< e/10. 

Consequently, the expected error of approximating / by a uniformly chosen _Fr is less than e/10. This also implies 
that for a uniformly chosen subset S of assignments to variables in A with size (1 — e/3)2l-^l, the expected error over 
S satisfies: Er[Pr(a.b,c)[-Pr 7^ /]] < e/10. Therefore by Markov's Inequality, we obtain the following observation: 

aG5 

Observation IV. 7 For a uniformly chosen subset S and Fr as described above, Fr will agree with f on (1 — e/3) 
fraction of the coordinates {(a, b, c), a G S} with probability at least 7/10. 

Now if we go back and recall what the algorithm does in Stage 2, we will observe that the generation of the 
hypothesis in Stage 2 is equivalent to drawing a uniform Fr and S as described and resetting the values of Fr at those 
coordinates {(a, b, c),a ^ S} to True. This is because the algorithm only draws classical random examples during 
Stage 2. Therefore due to Observation IIV. 7"! the hypothesis will disagree with / on at most 

l-(l-e/3)2 + e/3 <e 

Tlie error ineurred by (a, b, c), a G tS The error incurred by (a, b, c), a ^ *S 



fraction of the inputs with overall probability at least 2/3. This gives the desired result. 
Note that this algorithm 
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• uses only a moderate number of quantum examples; 

• has overall query complexity with no dependence on 71, in contrast with known lower bounds fLemma llV.ip for 
learning from classical membership queries; 

• uses the EX oracle as its only source of classical information (MQ queries are not used); and 

• is computationally efficient. 

One can compare this result to that of Theorem IIV.2I which requires 0{n2^^e~^) quantum examples to learn k- 
juntas. In contrast, our algorithm uses not only substantially fewer quantum examples but also fewer uniform random 
examples, which are considered quite cheap. Intuitively, this means that for the junta learning problem, almost all 
the quantum queries used by the algorithm of Bshouty and Jackson [s^ can in fact be converted into ordinary classical 
random examples. 

1. Lower bounds 

The algorithm of Theorem IIV. 51 is optimal in the following sense: 

Observation IV. 8 Any 1/10-learning quantum membership query algorithm for k-juntas that uses only j^^^ clas- 
sical MQ queries must additionally use il{2'') QMQ queries. 

Proof: This statement easily follows from Fact IIV.3] since a classical membership query can be simulated by a QMQ 
query. ■ 
Contrasting our junta learning algorithm with Observation llV.Sj we see that if the allowed number of classical 

examples or queries is decreased even slightly from the 0(2'^ log e~^) used by our algorithm to y^S'^', then an additional 
^l{2'') quantum queries are required, even if QMQ queries are allowed. 

V. CONCLUSION 

We have given some results on learning and testing fc-juntas using both quantum examples and classical random 
examples. It would be interesting to develop other testing and learning algorithms that combine these two sorts of 
oracles, with the goal of minimizing the number of quantum oracle calls required. 

Another interesting goal for future work is to further explore the power of the FS oracle. Can the gap between our 
0(fc/e)-query upper bound and our -query lower bound for the FS oracle be closed? 
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